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A. BENCHMARKING DATAEASE MACHINES 


Benchmarks nave long be S| 
Bemperiscns of differing ha MieeedieaslOns end hard- 
ware architectures. As sarly as ucticon mixes wste 

y 


Bormesd and tested over var 


means cf comparison E¢tween installations. Bayes Early works 
meeiuded the Gibson (Ref. 1], Andee vam, 9 ROE. 24],  m2425 
meen cCOnsiscctsad st machine instructions ordered by instruc- 
tion class. The Gibson mix was based on data collacted fran 
TBM 7090 installations, while the Plyan nix used programs 
Gun at T2M Syst2mn/360 installaticns. There hes bean sone 
feax GOne with similar appre 


ences  2O,eobtel: his benchmark parameters. 


m 
Meese acctzoaches invelved the tunning of 


e co 
tions in some high-lisvel languags. They used <th2 
emer smental r2sul+s from these runs <9 ccndt an e2nalysis 
6r the ccmputsr system performance. 


Penemiaakind as 4a term wsed “hroughonu= the indust-7 


mea myriad of differing contex«s. jim sa@eh Cases -he uiti-= 
Mats goal is to make an independent measures cr réelevéen- 
comparison of machins capabilities. Ties GCenvar2tsons oO: 


measures could be anything from the <hroughput to the speed 
Semeaiculacions by a certain internal compcnint, but in the 
u 


finel analysis scne msasurt or eval 





There are mary different ways of svaluating machine 


performance. Many manufacturers provide th2 cavability of 
Mmeachiuig MCnlcOting syszems *o “heist Squipmert. These mav 


ke either hardware monitors, which <a sense ch: 
e 


qt 
an 
fe 
7) 
ct 
a 


ection occuring in the system and keep s 
cx they may be software monitors which attempr t 
the same function with scftware heoks chaz ke¢ed tra 
—_"" Cperation and give th]e OPDSSanO=> a st 
analysis cf the machine action and performance. Sof cs 


Monitcers have he disadvantage of using a good d¢al of th: 


System time just for thair own operation. Theugh hardware 
Memiecos GO not suffer” from this daisadvantag2, they zrequizs 
the wiring of the mcnitor system into the hardware. Aol 
biggest disadvantage =o)6d these) 6tybpes of mtasurements, 
MemweVvVe>, is the inability tc make compazisons on differing 
Mechine configuraticns end between differen=z manufacturers. 


Fenchmazks ettempt +o solve «his problem by forming some 
Standardized testing methodology that is easily trtanepor- 
table from one Machine +9 amnG<-nes, Maciias. MO 
importantly, che measurements mad= must be relevent reagard- 
less of the machines benchmark2d and give an accurat? means 
c= compazriscn between these machin3s. 
Therefore, benchmarks are defined +o be 


meee s=rictions that will «test ai 


machine and yield scme generic s V2 an 
accurat measure of that machine Wes emcee SECU Cons i.duta- 
ieeOT « This data will then give the cbsérvsr svecifi 


Vv e 
quidelines for making relevent and géneral scomoarisons with 


Similar machines and configurations 
2. Database Machine Benchmarks 


With the advent of spscial-purpo 
and Eackend database machines, anew field of application 
lv a V 


for benchmarks exists. Pre eviouc 





rj 
rt, 
O 
if 


been used exclusively for the testing and pe m 

ation of large general-purpose mainframss. wee Ens prol z= 

feration of backend processors to unload s Z 

from the mainframe, these benchmarks have been in2zifective, 
©: 


because the computer system's capabilities 


specialized tasks are not banchmarkec. OwE pri Naty  cenccon 
is with the benchmarking of specialized backends known as 
datakas<e machines. In this context we mean a svecialized 


Peacesscr externally Jlirked to a mainfrane, with its cw#n 
special-purpcsée hardware and software for database nanage- 
mont. Fackend refers te this Sexcotnally-lisked and 


specially-built machine. 
eee CE JEctive 


At present the backend databas2 machine is in its 
infancy in the cemmercial marketplace. Nevertheless, “he 
database system is extensively utilized in vations forms and 
memediiferent tasks, exclusively in sons softwa 

fo) 


#i0n sperating on a larg2= general-purpose machine. I2 erder 


PS prcvid<e effective database functions the scftware-laden 
databas2 system consumes a S-cceaeaik “O25 =nh= Malnerame's 
ssources which severly limits the usefulness of the 


bas? mac 2 71 
Meese a-Cchirg and updating data in responss +o user queriscs. 
Byes Goeatly increases th2 ultimats useru 


Since *hes2 backend database machines are only a small frac- 


tion of the total system cost. The database machines now on 


the market have been implemented using microvrecessor *2ch- 
memegy YTathetr than fuliy-specialized hardware, thereby 
keeping their costs down. As the market exvands and more 


BeegescsS is Made in VLSI technology, we can e2xvoect tc sae 
more spccialized hardware at even lowsr cost. 
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Ciemonmece ye Nere 25 co develop seme basic <2sting 
procedures tc benchmark relational database machines Phas 


emesis also gives account of *t2st results verformed on a 


specific kackend database machine, “he kDM-1100, and its 
Se2ous conligurations. Pens itn due Osis tesulss of 
Mee. Gistieés in the operations of selection 2nd projection 
ema crdering capabilities. tieeciecsormcc this thesis, 
eens are three other theses, (Refs. 4,5,0], which describs 
Memeretait che test crocedures and results c= itoin opera- 
Eons, mice cgeneSaz.On Of =he databases us4i in ‘th: 

test procedures and results. Thea 


B. THE BENCHMARKING ENVIRONMENT 


Our crimary emphesis is to evaluéte the performanc= 
the system/machine under typical operating ccn 
*his sense a standardized workload model a 
This includes th2 use of typical user eas 
Maeaad.=icn to he design of a database. ity == 


m 
database, we dezvelcred a paramaterized database cenerater 


mee WLli generate cur databases with attributes accecrding 
Mmemeea SDeciried format and with values fro We L=de= ned 
@emaons according to specific iistributions. We ches? this 
approach so that we could Bedees Of “in -2rp=et accura> <=ly 
mre results of any given query. More details are given on 
meme cn.e€xt and design of the database in Chapter II. 


Query streams are developed =o test she full range 


O 
possible user operations. All quériezs are in forms of 
Bemec=.Cnh, fprejection, cr join operations as may be mad= by 

ypical user. Diem celaimuscsy SY,=4a% and salecticn of 


Sie y Screams is discuss3d furcthe> in Chapter IIT. 


v3 





ire add: =I0n, Thieme UvVwronmeme ava: lenle £0 us for the 


test runs is vér Eases cued. Ther= are no hardware or 
software prebes available Secieeteme Of Wf=Ssting, nor eny 
Seaeistical infcrmaticn on che backend machine. Ouse Say 
MeGOuUrSS 2S to use a built-in Fetrieve function that will 
giv 2 readout oe the databases machine enietel << 
MecOortuneteciy, “he clock has a low reselution, IWEQ7 or < 

tO Te. [2tevye the {ine 


second. A system call is executed fo 
Before and after each test query, thsreby previding a crude 
n 


yet ccnsistent time measure 


The actual testing is dent using a UNIVAC 1100/42 

mM. The system is located at the Pacific Missile 

m, Point Mugu, Ga ieee =n ios The basic databas 
Machine used is the RDM-1100, which is a Brit+son-Lee IDM-590 

o run as a tackend *o UNIVAC 1100 computers by the 
CoE ewOswchaccwor 2h, Calimrernia. 
co t= Seang 2c eene USimig LURSSerieam queries in an 
ye environment. Tnese queries are run eit 
Peeeecucil, CElerEreGM a Zeemote terminal set up 
Naval Postgraduate School, Monterey. We preftar te 
<2st gueries in a stand-alone, single-user mode in order 
feememizs the effects of workload variability of the host 
Méchine. In the event that the querits are 20 
alone, the number cf coincidantal users is very 1 
little or ne difference is observed in the measure 


on2 run tec another 


2. The Host Interface 


Giesemeeatac- bP2=twetn Univac and «he RDM is via 3 


word channel; +he FIM is treated as an I/O de 


< 


2635 DY sae 
UNIVAC nainframe. The standard IDM dévice is capable of 


communicating over an RS-232 serial interface or an IEEEF-488 
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Patrellel interface. PMiemeonmuntcatlon board of the EDM a+ 
mee Magu has been mcedified to be compatible with the Univac 
system. It supports Eyte/word channel interface with a 200K 
byzteyseccnd capacity. 

The driver routines on the Univac host nhnandle thes 
parsing cf the user queries, and tzanslate them into *he IDM 
mecernal format. The host also handles *#h2 communication 
Meo-Oco!] with the backend machine. The backend, in addition 
+o performing ‘the necessary handshakes, will oe2rform the 
Mequired error checks ard cause ‘the host +90 retransmit in 


meme event that an errcr is detected. 





[eecchal cache memery, ang has am Gort ionsal iaccele=atoz 
board. The accelerator is a high-spesd processor designed 
Bempertorm certain common relational functions ir Ses ato 


mmecreass the sverall system performance. The machine can be 
Semertaqured to hold 1-6 megabytes of informazion. We have 


Mae ss +S On the follcewing configurations: 
(ye a7e-Neagabyse Cache withcut ecceleratccr; 
(2) 2-megabyte cache with accaleratcr; 
(3) 2-megabytea cache without accelerator. 


M@estaitst cf these configurations :s no longer marketed. 
c 


The standard package con*«ains i1-meganyte of cachs memory and 
nc accelera“or. In addition, the machine used in our “*2sts 
is linked exclusively to the Univac 1100, and is eauipped 
ween 6€6COmnly@6cne dowek =con troller, With access to two 


600-megakyte disks. 


LES: 





C. THE BENCHMARKED BACHINE 


Womemeso tO bestssct OUr Work to the ifDM-500, a rel 


tional datakase machine. This type of machine is 


r 
new on the database market. ME Rodi 1S ne- cléea> het 
i+ will be the predcminant database machine architectures, 
the latest litereture and current trends apoear to indicate 


mage it may play an impettant role, at least in the short 


un 

Mie Gelationel model 1S intuitively easier tc use and 
understand than cther database models, Ae aa oes Ss {hat 
Meme) Signiticently conztributs t9 lowez software develop- 


Meme Costs. Nevertheless, fully-impiementesd scftware 
relational database management systems have severe perfor- 
mance prcblems Pr eaneonimcOst Of PeGscrm2=ang =Telational 
operations, most Sabakinaly the “jotnr ands «psc j©Ct2on 


Operations, underlies the precblen. 
ene t DOM—EGaeawe antler cstoineethe or 
models and the advances in technology that 


a 
P 

special-curpose vrocessors and backend systems + 
ae 


the majority of work, we feel that the relactiona 
Machine will piay an importan= role in che database manage- 
ment market. Peewee ron-peo LDN-500 425 cue cf the first 


ct 


= 
Machines to take advantage of 


ce 
Fate it intc a relaticnel database systan 
Ss 


as a kackend to a variaty of mainfrans 


The Britton-Lee IDM-500 is a backend relational 
re) 


databas2 machine «hat can be linked to one mr more hest 
computers. Amperif Corp. markets this systém under an CEM 
agreement as the RDM-1100. Essentially, che system is a 


Peeeccn-Lee IDM-500 with Amper:f providing <hs host ana 


backend interface software +c communicate with thre UJnivac 
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moo ana 2 host-—interface nro 
Beechnitecturse of the Pri: 
Will use IDM-500 and RBDW-1100 interchangeably. 


The backend is a nodular, =X Oe 0a) 1S 
Meer CErLlecsssce~based Systsm crganized around 2 central high 


speed bus. Each module is funccicnally oriented. 


meee ec hnoleaqy and Functionality oz Modules 


The RDM-1100 is made up of six basic modul¢s organ- 
[Mea cn a central high speed bus ( sea Figure 1.1 again). 
h NEE Ons > 


The ncdules perform the following fu 


ée. The datakase processor 


The datakase processor, a Z8000-based microporo- 
cesser, supérvises and manages all syst¢m rescurces. This 


processor executes mest of the software in the systen. 


b. The datakase acceléerator 


The datakass ecesietatcr (ah eptional perecessor) 
is a high-speed process Wate CV Seaerceon set Speech 11— 
Mmmmeyveaesigzed to perform and optimize certain functirns. 
Iz is activated by the database processor 2S ipprepriat:. 
The accelerator has a three-stage p x 
n 


Meee eucticns at up tc 10 MIPS. Weis 16)9 


Meek aciivity and process data at disk transfer rates. The 
accel ter and +he RDM software are so configured that the 
ae of database work is performed by che accelérator 
under the direction of the database processcc. 
c. The main memory 
The RDM main memory, or cache memery, is 


compesed of 64k-bit dynamic RAM chips. The RDM car be 
configured with from i-negabyt= <0 6-megebytes of memory. 
Mis memcry is utilized for RDM system code, disk buffering, 


indices, and user commands 
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The entire system uses a ic 
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Remeaysceneear Ye COr-jdumed with up te 4 disk 
c 


Sen-etclisr medules. Boe CEN geet = Coe Menas> £5cm cns * 
Mont disk drives. The Gisk ccntroller moves data between 
Pee chal disks and the RDM main memery. The disk centrelles 
Memieeagreaq =o work with the accelerator which can prtec¢ess 
Mera atx CLE teens ter "sees. Wie Deora! =20Ss. Cont sol 
Mecule SULPCrTts up *o eight téne drives, which can b2 ussé 
mee direct disk-to-~tape dbacku iceaeeloaccenG, ana RDM 
Beertware icacdcirg. 
fe tie NOS: inte=zface 

Meme Mw artadirne heSt(s) communicate vila the host 
meet face module. Tisomte di cmdcesp-s Commands Trcm cre oF 
jese hcsts, Memmeceme S550 € IC AeCeCKong , GaWses is. hos. Le 
ETe=rensmit i-f an error is detected, and inforns the de+abase 
Beeele=S=c- tha~ 12 13S moving a command into the cache. Bech 
Mee 102 2tlacs module can handle up +9 eight hests. Hence, 
Meee ths £Utl 8 interiaca modules, a maximum of 64 hosts car 
Memeccemcdated by «he RDM. Pre Seanda=d ° z=eerftaece module 
See porss beth RS-222 s€¢riai tien aace Of 25° [EEE=486 

c 


Baeailie’ inte 
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II. THE DATABASE 


mi ouL penechmark Measures on the RDM=-1100, it iS iInmpor- 


tant to modsl the queries cr transactions tc be preczsséed, 


ma =c mcdel the database The performance of aény database 
system depends not only on the characteristics of the data- 


bas2 systen, Pema isO On ci] Size and S=ruc 
Seeabdase. Considering this “wo-dinensional problen, 


+o bhuiid da*abase2s whers the values for <= 


wo 
Q 
re ig 
TY) 
ct 
ct 
I$ 
t- 
ry 
eo £ 
it 
) 
Pome 
ey) 
je hae 


be selected from well-defined domains. In 
thaz these values snould have sp¢cirtied and well-ferned 
BE rend e2Ons =O aid in the prediction of ths reser 


BO= any given query. 


emenev= bULIt 4 paramstierized telation qenertator, a 
Sefctwars system to genezate relations fer synthetic data- 
bas2s. These synthetic databases are then used by Our query 
Stream tc simulate the activity of accrual users on «he 
Syt 2m. B=Veraq. ©: chese daesabases ate built, varying <zh¢ 
maple Widths as well as the number c& ‘tuples per relation. 
fmeeie) at temp: to distribut?e the databases cn the disks tc 


[mee Specitic actions on the processor, such @4s fj 
tions between relaticns on th sam¢e or sa@verate dis 
this tanner we seek «c find anv sigaific 


a 
femeeee GiSstribution and Ilccation of the data on disk 


meee HE USE OF SYNTHETIC DATA 


As with any system nodel, it is imp 
Synzihetic data adequately reoresent =he sssen*tial character- 
istics cf real databases. 3 g 
databas=, we can r2present 4 su 

eo) 


base and save time and space ¢ 
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Be= Cf the real-world database. However, the craqanizéetior 
Weromgh cen Crevlide en emulation of the seal werid. 
The synthetic databases we have designed i 
pes at would exist in a réal 
integer, character, and so on. For attribute vaiuss we have 
aes OF 


incorporate: s¢gquertial and random ders, as well as 
Meoupings according to specific mserewe) Gis =rabutions. 
These are more fully described in the next section. cps oY 
meee cCOLMmMa: we Can net cnly accurately predict the cutcone, 
Mme. amcunts of data returned by 2 query, but we can alse 
easily reproduce the databases cn other systems fer furt 


tests. 


B. GENERATION OF THE SYNTHESIZED DATA 


When cesigning the database, our first concern is with 
fee physical sizes thet should be used. Dae £elatc ones mis 
BemeeetgGs eEercugh to test the full capacity of the systen 
pmememeaningiful cnough 0 include? various a*tributes. Po 
Seeemcl=, «we choose tuple wicths cf 100, 209, 1000 


Iece= With the maximum tuple width being li 


~~ = cS 
ee 
uf 
fou 
sD 
qf 


eae secend conSideration is ho large the relations 
Peould be, i.e., how many tuples per ralacticn. AGeaess Ls 
@ea@er to test che system for both la d 
BemecdieciGe On relations with 500, 1000, 2500, 5000 O 
Mapees. IheSe are arhitrary d¢cisions. Ths relaticn sizes 
eee MUlj-iples of the smallesz number in order *o es 


S@emeaticcns cf the test results. 


Burensexe Considereticn is the actual design ard building 
cf the data generaticn tocl. We envision a great many data- 
Seteee With differing configutations. Hits, 20) i htshe Coive 
amterface tc a generaticn orogran appears ‘+o be the mos 
ecrective apprcach. Uememge tse leacally “aver ieble IBM 3032 
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VM/CMS installation and PASCAL/VS as the language, an inter- 
Beerve system is built. For more infezmacion cn th 
programming, aeduemoceseacion Of this =Ool, pl2ase see 
Maer. 6]. 
Using the interactive system, the user is allowed to 
m 


asfine the fermat cf a relation in response to sytem 


Mmeomets, On an attribute-by-attribu<ce basis. Pie, Sapee 
Peoth and relation siz2are iefined. The user is then 
@llowed to tadd' attributes to tht cunpies one after anether 


Mees] he treachnes the desired iinic-. 


The user can checse from several methods cf attribute 
value generation. Integer values can bs sequential or 
random within a specified domain. Uniqueness of the rarndon 
miceder can be assured. The integer can be either one, <=wo, 
Memmeeur bytes. Chaztacter-strings can alsc b= chosen, either 
compressed cr uncompressed, in a collating sequence or in 
some randcm order. Character string vaiues can alsc bs¢ 
selected fren enumerated domains Sees” «sande m ¥ Se 
mmeeraing toda specific discrete distribution. in 20s 
maOQcctype io] ee 


PecmeeCrcecs--c GiSteabWeeons ere iimited = 
ow. ihe Wser is also gaven th= oppcrtunr 


Sryvencti1or £O8 Gach relation and its 


mH AO 
'D 


iS designed and implemented wich a 
A 


fo) ives. ees hOweVer Modular fcr addsng alzterna- 
tive moet aAS prototype, such as exponential or normal 
mest uabuticns. 

We use a standard <tsmplat2= for seach tuple width. A 
porticn cf «his template is standard for 2ach relation ( see 
Peegie 2.1). Sach relation contains: a segquential-integer 
Beet kute, a 44-byteée-in er,'key'; a character-a*tributs 
Smeeece’, Which is iden=ical in numerical value =o ‘'ksy't but 
Seema asc 2 character string and not as 2n integer: 3 
Remo @—-2n=Sger-accrikute trand* of 4-dyte integers; ep 40\ ea! 
Smertacter-string-attribute Tenans ' which contains 
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| 100 BYTES { 200 8YTES j 1000 sYTES j 20090 BYTES } 
ee icv TYPE ? FIELD TYPE { FIELD TYOE FLELD eae 
ee ee ee a ae i a oe oe i] ee j SS SS SS 
{ «KEY 14 } KEY 14 | KEY [4 ; REY 14 |} 
IMIRRGR Cit ! MERROR Cit | MIRROR Ctl ( ~perRorR cit ft 
{ RAND [% ’ RAND 14 { RAND {4 i RAND eat 
IUNTORAND 14 1 UNTORAND 14 { CHARS CA 3 | CHARS C79 } 
t CHARS C4 { CHARS C14 PS cy } PS cy 1 
{LETTER cl { LETFER Cc! i PLO Cy { Pig Gom} 
t PS cg { PS C9 } P2) eo j P20 cy 4 
mtd co | P10 C9 { P25 CY { HDs cy } 
{ 220 co ¢ P20 c9 i 2 39 Cy ! 3) Cc) } 
${ 2°25 C9 j P25 C9 j P 35 cy { P40 Cc? t 
$735 cg | P30 C9 | P4n C9 ] 1 SO Gan} 
{ 27590 C9 1 P35 cd | P45 CY ) PED co | 
1 P75 C9 ) P40 C9 { PS) cy j 2 T7T) Cc?) | 
{ 80 CY } P45 C9 ' Po} Cy { PTS cy | 
— ss __* PSO ca i P65 Cy | PRY Gor 
! PSS C9 { P70 C9 { POO co | 
! » 60 C9 ‘ P75 C9 f P 109 Go} 
{ P65 C9 } Ps0 CY { UPLYU UC 255 1 
j P7O C9 | P35 cg i UP 20 UC 2551 
i PTS C9 { Pg) cy ! UP 2S UeE2Ss} 
] PS3g cy { P1090 Gy { UPSu We 2551 
{ PBS CY { UPLO UC2551 UPT5 OE255)) 
{ P90 C9 { UP25 UC255 {1 UrPs&O Wel 55 
{ PLOO C9 { UPSO UC2554/UP10u Bee aa 
FIELD TYPES 
C- COMPRESSED CHARACTER STRING 
{MAXIMUM OF 255 CHARACTERS) 
Ul - 'INCOMPRESSFD CHAPACTER STRENG 
{MAXIMIM OF 255 CHARACTERS) 
1% —- FOUR-BYTFE INTEGER 
THIS FIELD MAY CONTAIN ANY INTEGER VALUE BETWEEN 


~-2014724832643 AND ¢226147.464833,647 


Figure 2.1 Tuple Templates. 
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Siareccers in a collating sequence. aL 
Bers in ‘'chers' is depend2nt on the tupol 
0 
Ss 


ensure zhat tuples are exactly 100, 2 
SL 


bytes wide. The length oft ‘chars! Soo] SSCs] 
number of characters required to ensure that the *-uvolis is of 
the proper width. [Tiewaamaele tteld == 2S pr=sen= te aid in 
randemizing the order of the tuples and «he odurpese of the 


Meercr field is to0 compare the performance c= identical 
retrieve operations based on quériées qualifisd cn <<he 
meme rtial-integqer-attribute,'key', anda eho G@hieece.ce- 
meer lruge, 'micror'. The 100-byte and 200-byzse tuple 
Memeean a Ssequentiale-unit-letter field of 1-by** charactar 
in G0i lating sequence, Vrettert, and a unigne 
random-inteqer-acttributs of 4-byte integers, ‘unigra 
Seem cetipiacte 2S then tilled out “wash attributes for 
Eno 


which the values aze 


values. Per exemple, the P10 attribute specifies attribute 
values with a uniform distribution over =sn unique values. 
A rétrieve statemen Weep ONG due li taer coulda | then be 
Meee ten tC retrieve 10% of the tuples in che relaticr. The 


mumber o£ such fieids is dependent on che tuple width. 
fae cmee neo wndatea basses 25 Conmplst=, mul=ipls 


Ss O 
Gieeaeletsen: a=> BULLI: using. =n: 


instances ci ea Ss i0eerraci eve 
Bemera-icn «col on the IBM 3033. Pre 2ete=22ons als her 
Mmeecrot read ~O tape storagqs icr transport to P+. Mugqu and 
the UNIVAC 1100. THe dete 2s leaded cnitc <=he YUNIVAC 1100 
Beeak>S and tren loaded + the backand databas2 machine using 


Smoatik-loed utility. 

Téest¢ are planned on the basis of an assumed capability 
Memecntrcil the distribution of the data on the RDM 1100 
eesks. Mieco Usvi ta -YessS direct a celaticn <0 @ specifi 


gisk is net implemented, althcugh <he space 311 


OC 
Peco @ene w= p2 ts acress Myl>ipls disks. The pattern of 


datahb 
block ailccation POGerelatiens Within the dacabase is cont- 
rolled ithin the database machine, and is no z predictable. 
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The interaction kretween +h 
through czhe sottware interfa 
JTanguag?), provided ty Amperié 
mre uset*s ROL command inte th 
format and sends the formatte 
The scftwar2? requirement for t 
backend machine is independen- 

When performing the tes 
@eouped intc run-streems in crd 


Seecke availabla> «ime. 
has be¢n very restricted. 
Piece scend=a2lcne, 


runs single 


ome hesz werkloai variability, 


run streams during the evenings 
Meon wW2 want to Fun sets E 
configurations. This again re 
Memcun cur perfcztmance *ests on 


Mad=t2tOnal ccrstrain ts are 
interface software provided by 
Mu 


we 


a= 


_— 


7 


Meer che machine at Pt, 


queries not 
the 


Peduce varability in 


supported. 
s*ozed-commands facility 


aan 


~~ «wm #4 


the par 


Beetlity allcws the user to st 


by the interpreter as named co 
When 


the parsing 


these 5s 


= 
— 


j 4: 


fsiel Ley 
to look uv targqet-list 


Gray Cc 1Oh atv. 


user ana 

RQL 
ne 

Pac wo na-Macra nc 


command 


2r on 
-~ce 
kends. 


-~ 
— 
—_— 


and on wee 
tSStsS over 
duces the over: 
each configure 
ree 
Rieceae sand by — 
Gils 
therefore have 
of the beckend 
ce. The stored-commands 
=e 


Mnandas 


imposed by pe 
he 

Pre~compil 
S 
=© 
Cre varse-trees prcduced 
in 
cored commands 
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A. SYNTAX AND SEMANTICS 


The Easic operations invelved in retrieving data ina 
Me22-i0nal system ere selection, projecticn and 1 
s=eticon Wili provide a basic oveérvie x 
Relaticnal Query Language (RQL), with per+inent exa 


For a more detailed explanation of the languég= as we 
Fa 


the datatase adminstrateor ‘funccions, Dle=ss2 =ssfer <0 
mecr. 5}. This thesis focuses exclusively on the selection 


and projection operations. Th? intarested reader is e:ncour- 
aged to read [Ref. &], for an explanation and evaluation of 
the jecin operations as performed on the RDOM-1199 and its 
Mert cus ccenfigurations. 


Simple selec=ion in ROL is 2xpress3d as fcllews: 


Teton AsALL ) WHERE A-CITY = "CHICAGO! 


Meee ~Stere= cd £6 2n =his case $5 A and the gualifier ALL 
i 


a 2 Pees GU@ se ees Consisting of a single predi- 
cate has ben added, Nhe AsClEY = “CHICAGO, he's 

alifier restricts the tuples returned to onlv those tuples 
miewnich the city attribute has a4 value of "CHICAGO". The 
Qualifier cculd hav2 mul*iple predicates, relatad by any 
Bae bcc _éan operators, Such as AND, OR, = (EQUAL), t= ( 


FQUAL), etc. An example is: 


RETRIEVE (A.ALL) WHESE A.CITY="CHICAGO" OR A. CITYS"4MCNTIREY" 
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K4 


n this cess the backend machine will return all the tuples 
ut 


Meee he relation A in which the city attrib 


< 


value "CHICAGO" or tke value "MONTEREY" 


Memes celectiten Cperaeticn restriczs the «tuples +o be 
ic 


Ber urcned. Myers oucetl en Ghetbat:On T=Stricts the a*=tribute 
meues cf a tuple: only a pertion of the attribute values cf 


each tucle are returned. For example; 
Rene vee (A.CITY,A <NAMS) 


Mmmmecenis case, the target list (A.CITY,A.NAMN2), specif 
Mectcibute values tc be projected out of the tuple and 
returned to the user. Oniy Ene Valves sof attr out ¢ 


ard NAME fer each of the tuples in the relation A will be 


returned. BR qdtcaleetzger (rot Shown) could be added 25 in a 
previcus example to limizc the number of tuples returned toa 


Pepecitfic subset of tke relaticn. 
Cemmands like these make up the bulk cf the quériszs used 
mmeche selecticn and projection tests v 
fiers attached. ROL has many more capabilities, such as the 
egate functicns and the BY clause. Fo 
moan rcrefer to [ Ref. 5], 


Eee LEST QUERIES 
The test gueries used are all selection én: 
a 


Seerations in the form of the opr2 


ol 

Vv WO 
Qualifications are used on these — =o sele 

percentages of the attribute valu cle 

da 


mM 

lie . 
fv 

ct 

f-- 

O 

| 

® 

_ 

FF) 


percentages of the tuples in e€ach rt 

Saaptet II, Single qualifiers ares used on the a i 

Meee = having discrets distributions to select only a giver 
b 


Peecentaqe cf each relation. Comparisons are made on the 


Zo 





kackend database machine's performanc2 as the petce 


data retrieved is varied. This variation cov 
eeOrns: Pie Poneene~age of <=eipl<es in @ relation ana che 
Percantage cf attritute values ia 24 tusls. AV el ie allay ers) 


testing is den2= cn Single-tuple retrieve 
rang¢ predicates on the kay field. Bach 
Meee 22 described in further detail in 

meemg Wich a ietailed descripticrn of the commands us24 tec 


Ww 
meat ev> «he deta. 


Vemiereieded Lefore, =ahe Most Czaitical restriction 
placed on *he performance tests is the lack o a 

BOOlS. There are no monitors available tc kéeev track cf CP 
See /e activities in the backend dacabase machine. The cn 


al 
available measurement capabilicxy is @ measuremen* of elapse 
m Ss 


time that ccuid be extracted from the backerd d2z*tapdase 
meen ine clock, which has 2 rzesolucion of 1/697: ct a egacond. 
Sime coLame Ccncerm in this performance evaluation is ¢ts5 
Mees mine the effects cf varying cettain arameters on 3 
backend databas¢ machin and gather some gross cverall 
measures. iat hes Sense, thecstore, we feel *ha> she rough 
m2asurements afforded by «he backeni machine 2r¢6€ still 


Beesttable fcr our purpose. 
ipeecrder <-O determine the slap 

a query, Aer eketsS 3 

btackend datzab 


2 command to 2x 
ase ma 3 
sach query. The retrieve command is of the for 


RETRIEVE ( TIMF = GETTIME()) GO 


Nes2On Of thas backend ma 


Sy u ch 
command is tsed +o print a time, in 1/60 second increments, 
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before and after our queries. (cengmeenss 2hSougneu. Cur 


experiments we can get gross, yet consistent measuremern*s of 


moral time =equirted tc execute the a Eine Dee 1 = ao 
Poo> resclu*ion, *ke comparison of identical queries will 


Pena soclevent Peri OrMance ComparisonS of th2 respons> time 
h 


"wemrack=na machine, 


Pier etidi cr deccr a Ve Of these  <cests 1s not <tc 
h 


G@=nerat= iarge volumes cf data with figures cf retrieval 
memes for particular gueries. Om Seve leawsvs Goa! 2S =O mete 
relevent angel of the machine perfcrmércs as the 
gueriles are varied id= specific parameters. To this end 
w= hove to make some ee... + 2 of the cverall verformance 
Mee his particu ere impor- 


tackend cCatab 


sv 
G) 


examples of th 
Myer a2icng with graphi 


Sseul«s. 





IV. PERFORMANCE EVALUATION OF THE SELECTION OPERATION 


fee CEPIWITTON OF A SELECTION 


Selection is ameans for the user to retrieve and 
examine pertinent infcrmaticn from a relation. The user may 
feepoece =the =reti=s> relation or he may restric: the infcorm- 
mean §«eturned <O him in two ways. H= may Limit the numbst 
@em-uples returned ky adding a qualification to the sélec- 
m20n cperation. he Cuameres Catson Swede tigse «che tuples 
retrieved tc those whose attribute values sacisty the cendi- 


mtons of «he qualification. OQiavie@eGecso: Consists «of 
Meea'ca tecS, assertions on the attribut2 values of the “ugie 
Seecuples. Multiple rredicates may be combired wi*k Esolean 
ees etors, such as AND, OR, EQUAL, NOT EQUA 


gu 

may alsc restrict *kée at*ributes values re 
Seely lListirg th h 
Ss 


O 
Beegection cf the rélation. Des) = 
on 


he fcllecwing secti 


Be. SELECTICNS IN THE QUERY LANGUAGE 


MmpeeCL zhe user is given ccnsideraole pewear cf selection 
Baeouogrh us® of the RETRIEVE ccmmand. Ysing czhe 100-byte 
relation described in Table 2.1 as a format for a relation 


Mpa tYpical RQL selection command might be: 
Reena ve( AcAL. ) AHERE A.KEY = 25 


In this ccmmand the keyword RETRIEVE is used t 
Seeetten, the A.ALL indicates «hat all ac*rib 


1.€., entire tuples, are to be raturned, and the kevwerd 
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WHERE identifies the quanti fier. The A.ALL may be replaced 
Wath an explicit listing of “«hose attributes issired. The 
axtributes may be listed in any order the user desires. 
Meer geta= Koy wOra WHERE and a qualification, the user may 
then indicat¢s which cf the tuplés are to be returned. is 
Mes Sxamclie, cnly these in which the KEY field is equal <«c 
Meare teturcned. The user may use other cperatcrs such as < 
u more a 


ebb VE ( AS ALL) WHERE A.KEY > 25 AND A.KEY < 100 


Meta Eteturn all tuples with the KEY &leld in the rana= 26 
through 99. The user is given great iatitud= in dtiimiting 
mee subset cf the relation he desires. For more detailed 
Mmeotfesicn concerning the capabilities and syntax, yee 
reader is encouraged to read { Ref. 5]. 


C. AN ENVIRONMENT FOR THe MEASUREMENTS 


[eet cctizs discussed in ‘shis Ghaptsr are from ‘*2ste 
mereamed Or the system configuration with 2-megabdyte cache 
Memory and ~he cptional accelerator. Lack of tim 1G 
meee gnificant number of tests on alternate ccenfiaqurations 
mer ccmparison. However, whesée tests can be cendu 


mene: COoniigurations without modifications. 


Re described in Chapczer EIT, the timing naasuraments are 
the Eackend systam's response *o a retrieve for its internal 
eyote sm’ clock time in 1/60-second resolution. In most cases 

aries due to the «ime 


a 
the méasurements are based on single q 
involved, Som] measuremants are avera v 
responses; thes2 are differentiated in the ¢ 


e 
follow. In all cases “he tests aré runs performed in the 
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eyenangs and weekends with virtually no other users or tins 


sytem. 


D. SELECTICN MEASUREMENTS 


Mm 


The figures in the f 
ered for selections with 
Be ticles returned is fr 
mie total number of tupl 
involved. The final sec 
S=d2ring capabilities on th 


bkeckend, and the effects of data co 
ae The Perce! 


Figures 4.1 and 4.2 show th 


iD 


So Wea = OGnc= -1Ns 


mer selecticn. Figure 4.1 shows neasuremenz=s on a database 
‘+h no indicies: Figure 4.2 shows neéasurements on a data- 
base with a NOn=ecluscered aind=x on <th¢ fo. eng 210 
eee CuUTeES. As described in Chapter II, the PS and £10 
attributes are attribute whosé vaiues are in a unifora 
@@ecribution over the eS ercen*age. The B35 
azttrikute values will be 20 unique valué¢s each appearing in 


5% of the tuples and ‘ths P10 valu¢s are 19 


u 
fete aposcaring in 10% of the tuples. The querie 


= 
da 
eae system will return exactly 5% of the tuples in the 


As avident in Figure 4.1 the system response time 
incréeas2s nearly linearly as the amount of data returned 
increases. As expected, «he larger is the tuvl2 sizé¢; the 
steeper is the siope, since the volume of the data increases 
Mees rapidly for the larger «<uple siz3. 
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Mea -iy linear relaticnship of the increasing respense *ime 

mma OL =the inereesing volume o£ data. Further discussions 
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Mem cupies to be ordered by KEY for storage. Sparse index 
Bemeadining ene entry pez block is built. heeoeec LS 742-6 
and2x, on the other hand, contains a unique entry for each 
meple 22 the reiaticn. VemmoEdc=ing OL suples Wlthin the 
Me tation is implied. 

Figure 4.3 shows response times for “he réetrizval 
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Rete vie asa) ORDER SY A. KaY 


Meecs A is =he relation name and KEY is an attribute in A. 
Me en crocered retrievs, the tuples area sorted in the backend 


machine 
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queries are run against a relation with no index, a relario 

ete 2Cn-Clustered index on the KEY att+ribnts, and @ rala- 
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meezan a Clustered index on the KEY az tribute. 
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B=spcons]e times are similar throughout the range of telation 
meescewe The t1dicies, Clustered or non-clustersd, provide ac 
mae Cana inDTOVemM=sn= ror this range tf relation sizes. 
Mee expec*=¢d results would have shown a Signiticant imorove- 
Mep2 cor <he ralation with @ clustered index. The 


theo tha tuples, even though the ¢t 
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n 


Oo oO 
py 


meas bUTEeS . The graph shows a si 


r4 
iD 
in 


nonse times fcr the relations with the non-clusterzed 
mme@ex. Locking at Figure 4.5, the i ae =O 
a 


Mpro 
e evident for simply qualified retrieves wher zhe index 
Car 


mo 
mon che a*tributes used in the predicates cf the gqualifi- 
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increas¢ in the respcnse =ime. The other tuple sizes show 
Similar improvements. 
Se oe ara Con press On On S222Ce20n Querses 

MmyeeraekK=na “dacanpase machine has the capability of 
Mee@eang cheracter sttings in either compressed or ancen- 
pressed fcrmat. A character striag in compressed forma* is 
BeOoted On the disk with no trailing blanks. The advantage 
15 a Savings in disk space. The tradeoff is the increased 
CPU tim2= required to compres nd uncompress the s*rindgs as 
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daza is mcved +toand from disk. nanos 456 shows the 
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Gmeesipececartrcetty, Figure 4.6 Shows tne results of 
mem sces > £Or the relations of 100-pyze tupit size and <1 


2000-by+e tuple size, respectively. pon ene 100+ 5y2s) uci 


the storage requirement is reduced by aporeximately 50% wher 
Meee ttriructes are fully compressed. In the case 92 the 
2009-kbkyze tuple size, the Savings in storage is 


approximately 90%. 
The graph shews a major improvement in ch 

meme fCr compressed relations. From the stz2tp slecps of th: 

Jine it appears evident that the greatest 


= 
speed is the amount of data that mist dass over the i 


WeSsne 
bis. The large reductions in tuple siz¢ for the conptessed 
“slation shcews a clear advantage overt the uncompressed reia- 
OT) « Pismecetiavyececomes incssasingly sSigqnificent for 
Belatazons of larger tuple sizes. Loogon tesco), a delay 
meee: Ct 10 for the larger *upl]e size and 10000-tuple rela- 


Seon is okts2zrvable. 
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Figure 4.7 shows th 
@Bexend sys*em*s sorting cap 


a 
are stored in the backend; th 


= Girt 
Mey attributes. [iNeGmceiaD GeDlGTt Su se=risves “with and 
MepeaGcit Cradecrting specifications on the KEY attribute. There 
is a slight incr3ase in the respons? time for the ordered 
retrieves, as might be expected. The differential line 


depicts tke extra time necessary for the ordering, which 
increases as the relation size increases. 


Figure 4.8 shcews the cost or performing the ordering 
n 


Or <zhe rackend versus the host. In this cas¢ batch runs 9 
the hes* are used to verform the queries. ia G2neual,. 21> 
response +ims 


batch retrieves show a narked imorovement in 
u 


for iden ategquer i over the run-str2am g 
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Bagure 4.7. This may b2 due «tc 


for batch 


shews 


versus an interactive 2n 


also thax for smaller-size2 
pericrms a more efficient ordering t 
for larger rélations the sort time 


time ci the kackend are comparable. 


tha d2acreased over 


4 
A 


ae « 


Vircnment 
roalat*ions 
han «he host 


Orccicne, hos: 


Finally, Figure 4.9 shows the effect of randomizing 
the order of the tuples in the relation. Using the zandon- 
mempe= attribute to scatter the tuples in ths trela*ion, 
Similar retrieves are parformed on “he ordered and rancdcn- 
zed relaticns. In this case thers is a non-cluster2d iniex 
on «he KEY attribute for ‘the relations. The aqraph shews 
minor variances in respense times between the two, cl¢early 
migecating that the crder in Which <h2 tuples are stored is 
Mae @ Significant factor in response time for the ordered 
retrieves. 

E. CQNCLUSIONS 

The restcnse times are generally linear, increasing as 
the amount of data te be returned is increasing. The amount 
cc data may be varied as the number of tuples in a relation 
Meetne width of the tuples. 

ities Greatton of iuandicies on tuples shows significan- 
improvemert in respons times when the ratrieve ccmnand is 
qualified on the indexed at+ributes. re 2ndeenes pacvide 
marked improvement as the tuple size incrsases 

The effects of data compression shows some interesting 


results. Figure 4.6 has shewn a ve 
Thes 
decrease in the number 
the differencs 


epetnoeerumper of blocks use@ 


compressed tuples. 
mapic tc the 
iO 


decrease 


it provement 


Bact, in time 
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ry large improvemen= for 


1s most likely attribu- 
ef disk blocks acc#ssed. 
Som Oeepecetecnal =O Ene 


ieroneni= cup les, 
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V. PERFORMANCE FVALU 


A. DEFINITION OF A PROJECTION 


Projection is a means to restrict the amount and to 


order the sequence of information returned +0 the user in a 
retrieval operation. More specifically, Sao qece= lon welt 
From 


Bestric= the attribute values that will be returned 
each tuple selected. Peon SC -20ONn anda celec-.591 Cc 
combined to limit the srange of values rer 


tion, a user can rearrange the ordering 
a 


values as the relaticn is displayed by varying the ordez of 
the attribute nares in the target list. Pas 25 hot *c¢ Say 
hac the actual crder of the stored relation is altered but 
fMma- che subset displayed to the user is ordered according 


eomnas specifica*ions. 


Be PROJECTIONS IN TEE QUERY LANGUAGE 


ma FECL the user is ive consid¢éerabisc 


Ve) 


n L 
describe precissly which attribut= values that h= wants =o 
Beeeetuened. Using the 100-byce relation describd 


2-1 as a fermat for a relation A, =he ROL commneri: 


BEPERTEVE ( ASKEY,A.MIRROR ) 


Meet return *o the user only those attribute vaiues in che 
relation A whose attribute names are? KEY and MIRROR. The 
user can list as many attribute nantes as he desires and 
Merce ~hem in any order in the target list of the RETRIEVE 
ccmmand. In tae case wher ali attribute valiues of & rala- 
e2on aze to be listed, the us3t may simply use A.ALL. ALL 
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Meee Modes Valles, 2.2., =etkire tuples, will be recurned in 

crder as they are stored. The use d Weliit2ecs 

to restrict the number of tuples returned. These qualifiers 
Pp 


Meee not te on the attributes listed. For ex2n 
Rowe (oae WEY, AsMeIRE OR ) HHERE &4.25 = "RED" 


Merb again return to the user only thosé attribute values in 
the relaticn A whose attribute names are KEY and MIRROR. In 
meg ion, the qualifier will restrict che tuples returned to 
*hose whkcse 205 attribute valu2 is RED. Wiese hea ae 
command alsc illustrates the méans to perform a percentage 
selection. The PS attribute values are colors selected fron 
an enumerated set. PG roteq=s COlO= Value in <one P5 
Sree -e 1S present in 5% of the tuples in the A ralacion. 
Using these known percentages, Siem Oo -  Cuaiae coca 1o 


select exactly 5% of the tuples in raiation A. 


C. AN ENVIRONMENT FCR THE MEASUREMENTS 


she same vstem configuration with 2-megaby-~]© cache memory 
eee ne] Orticnal accelerator. iack of time has prevented 5 
mom O0T2ining measurements on cthar confiaqurations. 

The prejection measurements are conducted for feur tuple 
eezes, 1.€. Miepyese, 2Z00=byce, 1000-byte, and 2000-hbyt> 
Memeeenc== petcentages of returns, 25%, 50%, and 75%. These 
percentages refer to the number of attribute values in ti 
tuple tnat is returned. With the excepczion cf zhe 100-byta 
tuple size, these are exact percentages; in che 100-byte 
cas2?, the number of attributes returned was 29% eet ae 
Teis is due to the tuples in the 100-byce relation having 14 
aeees tutes. Mee Gt Pehecamcag= Of 25% and 75% was not 
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moana ble, Nevertheless, they are still seferred tec as 25% 
Vv 


amd 75% projections. Purt her, the retrieval commands ars 
gqualifie Eyes setae lGNe »selections in crder te reduce 
Peeenet the amount cf data to be returned. BeGhe Guety Ws 


Bremtee. 10 times, Sach time with a different quzlilcfication. 
us h 


fe) 
*#he data in the ré¢lation and provids 2 better averegs 
espcense time. 

D. PROJECTICN MEASUREMENTS 


The *¢st queries used are qualified on *he P5 and P10 
r 


icon <+006©6 perform the aforementioned seléec- 


elati 
Been. Sach query is then repeated 10 times with a differen- 
e figures represent the average respcense “-ime 


qualifier. diate 
< 


Hoe =hrose ten n 


= 


- Each graph shews the respons 
iecer tn «oshns 


ests Sp 
seconds plettsd against the number Oa ae 


mevaticn. 
im FSrcentage of Prois 


In general thre differenc S 
Mye-Percent and ten-rercent selections is negligible, «his 
tome paziicuiarly SEuce tor the smaller-size relations. 
Gemoling the number cf tuples returned in a query can result 
In approximately a 20% increas2 (2.¢€., 1/3 second incrsas¢ 
in the tespensé time cn the average) in the smaller tuples 
and a 10% increase (i.¢., 7 seconds on the averadqe) in «he 
larger turles. Figures 5.1 and 5.2 show tha rasults of a 
25% projecticn over varying tuple widths, with Pigure 5.1 
mea oe Selection and Figure 5.2 for a 10% selecticn. As 
can be ssen, the graphs in these two figures are nearly 
wdentical. This 1s also the case for the graphs on c=he 50% 
ama 72m Projections. For exemple, in Figures 5.3 and 5.4, 
Similar graphs fer the 5% selection with 50% and 75% 
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Mele 15 used in the qualifier for Cie 0 queries. 
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smaller than any projection <in2, WheGrmundecaet es =hs= “ins 


the smaller tuples the backend does a strict se: 


re 
}? 
©) 
| 
Ee 
14 
}- 


O 
pomextcecesrg ths attribute veluss specified in «he projec- 
mon qualifier. As the tuple width increases, th | 

elect may take mor? time than that of the projecticn. For 
eee 200-byte tuple in Figure 5.8, =he full select time is 


again nearly linear, andthe times are slighzly mere than 


mmemtines fora 25% projection. The difference in resvonse 
between the full select and the 25% projeczion steadily 
increases zs the relation size increases, but even so the 


é 

full select is faster than the 50% and 75% projections. 
Reumitichiabagder=wadth tuples, Figuses 5.9 and 5.19 

Siow chaz the full select time is higher than «ie 

time for the small percentage projections. The full seiect, 

however, kas a much smaller siove, thereby crossing the l 

Gemene protection time and eventually showing a <«rerd of 

quicker restonse as the relation size increases. A 

Beectcular note is the uniformity of the curves for the 

Up 


vatying projections in the 1000-byts and 2000-byte «+ 


meaures 5.9 and 5.10. Haeeciwerasc, , oC wie Smaitee tiples 
the lines are nearly linsar with increasing lopés. The 


S 
lines for ‘the larger tuples ar nor linear and the slopes 


are very ¢ven. 


E. CONCLUSIONS 


In general, the projection results are very predictable 
in that the response time is nearly linear and the response 
time increases as the amount of data returned incrzases. 


The amount cf data may be determined by either the relation 
Seze cr the projecticn size. 

Micmatenscrce © —CemMDar? SONS zn Figures 5.7, 5.8, 5.9, 
woe 5.10, on the @ye=aeerena,. Show Scme Unranzict tated 


results. Instead of showing 2 clear advantage in the 
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Might be -=xpected the results Gyeweek the tuple wi6> ic. 
Meche Smaller <uple width as depzcted in Figure 5.7, the 
full select appears te run faster evan theudh the amount of 
Gata returned is greater. For the 200-byt= tupvlzs as 
G@2picted in Figure 5.8, the relationship is markedly dirfe- 


k 
ex tuples as graphed in Figquites 


merc « Fer the larg Die 2 wand 
Sao, the full select requires more time for “he smaiier 
molazicns. Nevertheless, its advantage becomes evident as 


the relation siz? increases. in summery, the full-select 
Seesetich iS Sensitive to the width of the tuples. In cther 

e the greater is the tuple width; the nigher 
pececome. The £uli-selec* oparation is alse sensitive to 
$1z¢ of the relations, although in an c¢pposite wey. 


2 
Peas is, epowteacgqervis the Freletion: <che s 
1 


Peed a er "cult tc determine woat eftec= “he cache ana 
+ Ewen Oubhe= COn=:gurations 


e 
? A need exists for nore 

y O wW 
Pr wiecths ard relation sizes in hopes of o 
é Pp = 


mee Tena 2£O “he relazionshi 


ea 
projecticns as the widths and sizes varies. 
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A. CVERALL OBSERVATIONS OF THE MACHINE PERFORMANCE 


The experiments described in Chapters IV and V show scme 


predictatle results as well as Some unexpected surprises. 
Ganerally the simple select operations, Glen OS Av eh out 
indicies, display expected +rends. The respense “ime 
increases as the amcunt of data to be returne 26) sel eyelets 


increases, @S shown in Figures 4.1and4.5. A Similar «rend 
is seen fer relations with compressed attribute values. AS 
megute 4.6 jllus*tates, reduction in the rasponse ti 
Memes Gnittacanm: fcr the large tuple widths 

Gemcemp -ession is high. The relations wit 
show expected improvements in the resp 
retrieves qualified cn these attribut? value 


C 
Ss 

Seme unexpected results, however, are seen for the té¢ 
9 


Besuwi=s dcaling with ordered retrieves, Figure 4.8 
bFackend shows an unexpected superiority in serting over the 
host for smaller-size relations. EVsnuy cor cic. laraqs re ]2- 


tions, up +0 10000 tuples, the backend maintains a response 


Sem> Ccomrarable with the host. On2 would expect «hat the 
Mainframe would have a significant advantage in computing 
power and show a major improvement when the relation is 


erd2red in the hest instead of in the backend. 

Mee ener 2A-Stesting resul= is ths effect of clustered 

and tnen-clusteréed indiciss on ordered eves, Creating 
will 

n 


stored in a specific crder while a 


r 
a clustered index on a relation cause the tupiss +o be 
on-clustered index does 
not imply any ordering of the tuplas. Figure 4.3 shows very 
Similar response times “«hreughout tne range of relation 


sizes, reqardiess cf whether the index is clustéred or 


6 1 





non-cluster 2d. ive = =enpli2 ss cha= the cetrieved tuples are 
= A u 


Se=aeceag yen when a cluste 


Gcneeachan dg Preecjection Of tupisc attributes ir 


Pie Show predvetabDle resukts. hieOUd head tees ne 


'@) 
om 
yay) 
oO 
ct 
Mm 
4 
<j 
yy) 
Q 
my iN 


r differing projection percentages and tuple 


EC 
Wedo-hs, the grephs display near linsarity in both dimnen- 
h 


sions. The response time increases as the tunle width or 
the number of tuples returned increases. Piva Si lpeising 
results are evident when comparing projection «te full 


e=isction. 


@emside> Figures 5.7, 5.8, 5.9, amd | Sao again: Ss 
pepearmed in Chapter V, the overlay of the full select on 
the varying projection sizes showS no positive <rend. Ths 


projecticn measurements are consistent Peseuglou~ “ne 
Meguces, yer the full selects relationship to the projec- 
*2Ons varies from one figure to the next. fwo ot titer to 

Pegures indicete that it 1s cheéper *o rstrieve entire 


eimles ~han to project ettribute values f£zom scne tuple. One 


Megire indicates that beyond a fixed reliation size, it is 
cheapez tec retrieve entire tuples. The fourth figure s2e¢ms 


to indicate that some degree cf projection is always cheaper 


Emam tl trieving the entire tuple. NOwGiwes Conclusion can 
be drawn. More *ests over a wider range of tuple widths are 
Bee red ¢50 Identify an overall trend or relationship 
between prejection percentage and th: SiS, selec. ton 
retrieves. 
B. DATAEASE AND MACHINE LIMITATIONS 

Whe considering the test environment, two specific 


ieecaetcns Stand abcve all else. The first of these is the 
Meme rescluzio“n of the clock fron which measurements are 


+aken. taeme Stanageasarzed suse oF £he GETTIME function 
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Sareugneu= =—he <Ss-c has mede comparison of various “<es* 
results cover differing periods meaningful. Even so Ow 
resoluticn makes the need for average ‘times over many 
Similar test runs a necessity. This greatly li h 

amount of time that cne can spend in running more as 


a 
tests and in verifying orevious results. A qreazt effort he 


been made tc find some other timing mechanisn. in tac oud, 
GETTIME reves to ke the easiest to ust, ohne. VRoss 


Semc2ccert, end, most impertantly, th2 easiest to control. 
The second limitation concerns th2 system conii fe 
amas the inability tc control the environment of both the 
host and the backend. The performance of “*hes¢e tests ha 
Moeeceen a very high priority of the parent command at Ft. 
Mugu. This is to be expected, since the host machine is in 
a producticn environment. Gaining exclusive use is verv 
@meercu.t and extremely costly. (allel Sdislol S) gaol) ens Recs ohe Semmens is 
tests are limited to weekend and evening runs, at m 
relatively low activity. TiiseeslL gust sean= ly reduced <the 
fmten ce SyYStém availibility. Also, in terms of =h Vv 
ment, he rackend system we used is a relatively new 
Sec quirmen-. ace, tne SYeem CONngieucaticon has beer 
Siancing godt hy during the ¢xpzrimantatio i: 
time each configuraticn becomes available has bee 
Consequently, not enough data can be col 


significant compariscns. 
Cc. RECCMMENDATIONS FCR FUTURE BENCHMARKING EFFORTS 
In light of the test results discussed herez, the dire 
a 


fee Oe £Y=3rS work should be soward effects of Vv 


c 
u 
indicies and ordering capabilities. The results of tests 9 
Various types of indicies and the ordering of relaticns sho 

ah 


eae mest starcling results. iene da On » some work 


required over 2wider range of tuple widths tic refine 


orevicus results. 
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Ancthéer aspect that warrants research is a mix 


O Ss 
Memem=eadlicsic system load, specifically tzats 
2S 1 


+o Simulate a 3 

with mulcipie users cf the backend ard a more re2listic host 
worklcead. The tasts in this thesis are runs on an unleade 
systen. In actuality, th use of the system will mcs+ 


Meee ty cccuz closer +t0 veak loading. perhaps different 
+rands may dev2lop when the host and/or backend are 
Byoyected tc different load conditions. 

Even though these tests are on a specific system, the 
are general enough in nature to provide insight for tas 
elaticnal machines and to aid in méking a comparison 


Greadiffsartent backends. 
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